Rank-Approximate Nearest Neighbor Search: Retaining Meaning and Speed in High Dimensions

نویسندگان

  • Parikshit Ram
  • Dongryeol Lee
  • Hua Ouyang
  • Alexander G. Gray
چکیده

The long-standing problem of efficient nearest-neighbor (NN) search has ubiquitous applications ranging from astrophysics to MP3 fingerprinting to bioinformatics to movie recommendations. As the dimensionality of the dataset increases, exact NN search becomes computationally prohibitive; (1+ ) distance-approximate NN search can provide large speedups but risks losing the meaning of NN search present in the ranks (ordering) of the distances. This paper presents a simple, practical algorithm allowing the user to, for the first time, directly control the true accuracy of NN search (in terms of ranks) while still achieving the large speedups over exact NN. Experiments on high-dimensional datasets show that our algorithm often achieves faster and more accurate results than the best-known distance-approximate method, with much more stable behavior.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nearest Neighbor Search using Kd-trees

We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensi...

متن کامل

An Improved Algorithm Finding Nearest Neighbor Using Kd-trees

We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than two. Since the exact nearest neighbor search problem suffers from the curse of dimension...

متن کامل

Approximate Nearest Line Search in High Dimensions

We consider the Approximate Nearest Line Search (NLS) problem. Given a set L of N lines in the high dimensional Euclidean space R, the goal is to build a data structure that, given a query point q ∈ R, reports a line ` ∈ L such that its distance to the query is within (1+ ) factor of the distance of the closest line to the query point q. The problem is a natural generalization of the well-studi...

متن کامل

Implementing a Parallel Dynamic Approximate Nearest Neighbor Search Algorithm∗

We describe the implementation of a fast, dynamic, approximate, nearest-neighbor search algorithm that works well in fixed dimensions (d ≤ 5), based on sorting points coordinates in Morton (or z-) ordering. Our code scales well on multi-core/cpu shared memory systems. Our implementation is competitive with the best approximate nearest neighbor searching codes available on the web, especially fo...

متن کامل

The Analysis of a Probabilistic Approach to Nearest Neighbor Searching

Given a set S of n data points in some metric space. Given a query point q in this space, a nearest neighbor query asks for the nearest point of S to q. Throughout we will assume that the space is real d-dimensional space <d, and the metric is Euclidean distance. The goal is to preprocess S into a data structure so that such queries can be answered efficiently. Nearest neighbor searching has ap...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009